Indexing and Querying XML Data for Regular Path Expressions

نویسندگان

  • Quanzhong Li
  • Bongki Moon
چکیده

With the advent of XML as a standard for data representation and exchange on the Internet, storing and querying XML data becomes more and more important. Several XML query languages have been proposed, and the common feature of the languages is the use of regular path expressions to query XML data. This poses a new challenge concerning indexing and searching XML data, because conventional approaches based on tree traversals may not meet the processing requirements under heavy access requests. In this paper, we propose a new system for indexing and storing XML data based on a numbering scheme for elements. This numbering scheme quickly determines the ancestor-descendant relationship between elements in the hierarchy of XML data. We also propose several algorithms for processing regular path expressions, namely, (1) EE-Join for searching paths from an element to another, (2) EA-Join for scanning sorted elements and attributes to find element-attribute pairs, and (3) KC-Join for finding Kleene-Closure on repeated paths or elements. The EE-Join algorithm is highly effective particularly for searching paths that are very long or whose lengths are unknown. Experimental results from our prototype system implementation show that the proposed algorithms can process XML queries with regular path expressions by up to an orThis work was sponsored in part by National Science Foundation CAREER Award (IIS-9876037) and Research Infrastructure program EIA-0080123. The authors assume all responsibility for the contents

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

iXUPT: Indexing XML Using Path Templates

The XML format has become the standard for data exchange because it is self-describing and it stores not only information but also the relationships between data. Therefore it is used in very different areas. To find the right information in an XML file, we need to have a fast and an effective access to data. Similar to relational databases, we can create an index in order to speed up the query...

متن کامل

Regular Path Expression for Querying Semistructured Data - Implementation in Prolog

We present regular path expressions (RPE) a language for querying data graphs and its context free grammar implementation in Prolog. A proof of concept parser and query tool is implemented and various usage examples are analyzed for semistructured data formats like XML and JSON.

متن کامل

Indexing and Querying Semistructured Data Views of Relational Database

The most promising and dominant data format for data processing and representing on the Internet is the Semistructured data form termed XML. XML data has no fixed schema; it evolved and is self describing which results in management difficulties compared to, for example relational data. XML queries differ from relational queries in that the former are expressed as path expressions. The efficien...

متن کامل

Indexing XML Data with UB-trees

Using the terminology usual in databases, it is possible to view XML as a language for data modelling. To retrieve XML data from XML databases, several query languages have been proposed. The common feature of these languages is the use of regular path expressions. Users are allowed to navigate through arbitrary long paths in the data by regular path expressions. Several index structures for XM...

متن کامل

Indexing XML to Support Path Expressions

The extensible markup language (XML) is rapidly becoming a dominating technology in the area of data intensive applications. Although several implementations are already offered in commercial products, especially DBMSs, there are still open research issues related to efficiency of XML storage and retrieval. This paper introduces and analyses new index structures suitable for support of regular ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001